101 research outputs found

    Multi-task Deep Neural Networks in Automated Protein Function Prediction

    Full text link
    In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.Comment: 19 pages, 4 figures, 4 table

    A signal transduction score flow algorithm for cyclic cellular pathway analysis, which combines transcriptome and ChIP-seq data

    Get PDF
    Determination of cell signalling behaviour is crucial for understanding the physiological response to a specific stimulus or drug treatment. Current approaches for large-scale data analysis do not effectively incorporate critical topological information provided by the signalling network. We herein describe a novel model- and data-driven hybrid approach, or signal transduction score flow algorithm, which allows quantitative visualization of cyclic cell signalling pathways that lead to ultimate cell responses such as survival, migration or death. This score flow algorithm translates signalling pathways as a directed graph and maps experimental data, including negative and positive feedbacks, onto gene nodes as scores, which then computationally traverse the signalling pathway until a pre-defined biological target response is attained. Initially, experimental data-driven enrichment scores of the genes were computed in a pathway, then a heuristic approach was applied using the gene score partition as a solution for protein node stoichiometry during dynamic scoring of the pathway of interest. Incorporation of a score partition during the signal flow and cyclic feedback loops in the signalling pathway significantly improves the usefulness of this model, as compared to other approaches. Evaluation of the score flow algorithm using both transcriptome and ChIP-seq data-generated signalling pathways showed good correlation with expected cellular behaviour on both KEGG and manually generated pathways. Implementation of the algorithm as a Cytoscape plug-in allows interactive visualization and analysis of KEGG pathways as well as user-generated and curated Cytoscape pathways. Moreover, the algorithm accurately predicts gene-level and global impacts of single or multiple in silico gene knockouts.Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich

    Special issue on microscopic image processing

    Full text link

    Synthesis of novel indole-isoxazole hybrids and evaluation of their cytotoxic activities on hepatocellular carcinoma cell lines

    Get PDF
    Background Liver cancer is predicted to be the sixth most diagnosed cancer globally and fourth leading cause of cancer deaths. In this study, a series of indole-3-isoxazole-5-carboxamide derivatives were designed, synthesized, and evaluated for their anticancer activities. The chemical structures of these of final compounds and intermediates were characterized by using IR, HRMS, H-1-NMR and C-13-NMR spectroscopy and element analysis. Results The cytotoxic activity was performed against Huh7, MCF7 and HCT116 cancer cell lines using sulforhodamine B assay. Some compounds showed potent anticancer activities and three of them were chosen for further evaluation on liver cancer cell lines based on SRB assay and real-time cell growth tracking analysis. Compounds were shown to cause arrest in the G0/G1 phase in Huh7 cells and caused a significant decrease in CDK4 levels. A good correlation was obtained between the theoretical predictions of bioavailability using Molinspiration calculation, Lipinski's rule of five, and experimental verification. These investigations reveal that indole-isoxazole hybrid system have the potential for the development of novel anticancer agents. Conclusions This study has provided data that will form the basis of further studies that aim to optimize both the design and synthesis of novel compounds that have higher anticancer activities

    Identification of relative protein bands in Polyacrylamide Gel Electrophoresis (PAGE) using multiresolution snake algorithm

    Get PDF
    Polyacrylamide Gel Electrophoresis (PAGE) is one of the most widely used techniques in protein research. In the protein purification process, it is important to determine the efficiency of each purification step in terms of percentage of protein of interest found in the protein mixture. This study provides a rapid and reliable way to determine this percentage. The region of interest containing the protein is detected using the snake algorithm. The iterative snake algorithm is implemented in a multiresolutional framework. The snake is initialized on a low resolution image. Then, the final position of the snake at low resolution is used as the initial position in the higher resolution image. Finally, tile area of the protein is estimated as the area enclosed by the final position of the snake

    Protein domain-based prediction of drug/compound–target interactions and experimental validation on LIM kinases

    Get PDF
    Predictive approaches such as virtual screening have been used in drug discovery with the objective of reducing developmental time and costs. Current machine learning and network-based approaches have issues related to generalization, usability, or model interpretability, especially due to the complexity of target proteins’ structure/function, and bias in system training datasets. Here, we propose a new method “DRUIDom” (DRUg Interacting Domain prediction) to identify bio-interactions between drug candidate compounds and targets by utilizing the domain modularity of proteins, to overcome problems associated with current approaches. DRUIDom is composed of two methodological steps. First, ligands/compounds are statistically mapped to structural domains of their target proteins, with the aim of identifying their interactions. As such, other proteins containing the same mapped domain or domain pair become new candidate targets for the corresponding compounds. Next, a million-scale dataset of small molecule compounds, including those mapped to domains in the previous step, are clustered based on their molecular similarities, and their domain associations are propagated to other compounds within the same clusters. Experimentally verified bioactivity data points, obtained from public databases, are meticulously filtered to construct datasets of active/interacting and inactive/non-interacting drug/compound–target pairs (~2.9M data points), and used as training data for calculating parameters of compound–domain mappings, which led to 27,032 high-confidence associations between 250 domains and 8,165 compounds, and a finalized output of ~5 million new compound–protein interactions. DRUIDom is experimentally validated by syntheses and bioactivity analyses of compounds predicted to target LIM-kinase proteins, which play critical roles in the regulation of cell motility, cell cycle progression, and differentiation through actin filament dynamics. We showed that LIMK-inhibitor-2 and its derivatives significantly block the cancer cell migration through inhibition of LIMK phosphorylation and the downstream protein cofilin. One of the derivative compounds (LIMKi-2d) was identified as a promising candidate due to its action on resistant Mahlavu liver cancer cells. The results demonstrated that DRUIDom can be exploited to identify drug candidate compounds for intended targets and to predict new target proteins based on the defined compound–domain relationships. Datasets, results, and the source code of DRUIDom are fully-available at: https://github.com/cansyl/DRUIDom

    CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations

    Get PDF
    Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-tointerpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases

    Identification of an mRNA isoform switch for HNRNPA1 in breast cancers.

    Get PDF
    Roles of HNRNPA1 are beginning to emerge in cancers; however, mechanisms causing deregulation of HNRNPA1 function remain elusive. Here, we describe an isoform switch between the 3′-UTR isoforms of HNRNPA1 in breast cancers. We show that the dominantly expressed isoform in mammary tissue has a short half-life. In breast cancers, this isoform is downregulated in favor of a stable isoform. The stable isoform is expressed more in breast cancers, and more HNRNPA1 protein is synthesized from this isoform. High HNRNPA1 protein levels correlate with poor survival in patients. In support of this, silencing of HNRNPA1 causes a reversal in neoplastic phenotypes, including proliferation, clonogenic potential, migration, and invasion. In addition, silencing of HNRNPA1 results in the downregulation of microRNAs that map to intragenic regions. Among these miRNAs, miR-21 is known for its transcriptional upregulation in breast and numerous other cancers. Altogether, the cancer-specifc isoform switch we describe here for HNRNPA1 emphasizes the need to study gene expression at the isoform level in cancers to identify novel cases of oncogene activation
    corecore